On Low-Risk Heavy Hitters and Sparse Recovery Schemes
نویسندگان
چکیده
We study the heavy hitters and related sparse recovery problems in the low-failure probability regime. This regime is not well-understood, and has only been studied for non-adaptive schemes. The main previous work is on sparse recovery by Gilbert et al. (ICALP’13). We recognize an error in their analysis, improve their results, and contribute new non-adaptive and adaptive sparse recovery algorithms, as well as provide upper and lower bounds for the heavy hitters problem with low failure probability. Our results are summarized as follows: 1. (Heavy Hitters) We study three natural variants for finding heavy hitters in the strict turnstile model, where the variant depends on the quality of the desired output. For the weakest variant, we give a randomized algorithm improving the failure probability analysis of the ubiquitous Count-Min data structure. We also give a new lower bound for deterministic schemes, resolving a question about this variant posed in Question 4 in the IITK Workshop on Algorithms for Data Streams (2006). Finally, under the strongest and well-studied `∞/`2 variant, we provide the first randomized lower bound which is simultaneously optimal in the approximation factor , the universe size n, and the failure probability δ, for the full range of such parameters. Our lower bound shows that the classical Count-Sketch data structure is optimal in all parameters. 2. (Sparse Recovery Algorithms) For non-adaptive sparse-recovery, we give sublinear-time algorithms with low-failure probability, which improve upon Gilbert et al. (SICOMP’12) and Gilbert et al. (ICALP’13). In the adaptive case, we improve the failure probability from a constant by Indyk et al. (FOCS ’11) to e−k 0.99 , where k is the sparsity parameter. 3. (Optimal Average-Case Sparse Recovery Bounds) We give matching upper and lower bounds in all parameters on the measurement complexity of the `2/`2 sparse recovery problem in the spiked-covariance model, completely settling its complexity in this model.
منابع مشابه
Deterministic Heavy Hitters with Sublinear Query Time
This paper studies the classic problem of finding heavy hitters in the turnstile streaming model. We give the first deterministic linear sketch that has O( −2 log n · log∗( −1)) rows and answers queries in sublinear time. The number of rows is only a factor of log∗( −1) more than that used by the state-of-the-art algorithm prior to our paper due to Nelson, Nguyen and Woodruff (RANDOM’12). Their...
متن کاملAn Optimal Algorithm for `1-Heavy Hitters in Insertion Streams and Related Problems
We give the first optimal bounds for returning the `1-heavy hitters in a data stream of insertions, together with their approximate frequencies, closing a long line of work on this problem. For a stream of m items in {1, 2, . . . , n} and parameters 0 < ε < φ 6 1, let fi denote the frequency of item i, i.e., the number of times item i occurs in the stream. With arbitrarily large constant probab...
متن کاملBreaking the Variance: Approximating the Hamming Distance in $\tilde O(1/\epsilon)$ Time Per Alignment
The algorithmic tasks of computing the Hamming distance between a given pattern of length m and each location in a text of length n is one of the most fundamental algorithmic tasks in string algorithms. Unfortunately, there is evidence that for a text T of size n and a pattern P of size m, one cannot compute the exact Hamming distance for all locations in T in time which is less than Õ(n √ m). ...
متن کاملDifferentially Private Continual Monitoring of Heavy Hitters from Distributed Streams
We consider applications scenarios where an untrusted aggregator wishes to continually monitor the heavy-hitters across a set of distributed streams. Since each stream can contain sensitive data, such as the purchase history of customers, we wish to guarantee the privacy of each stream, while allowing the untrusted aggregator to accurately detect the heavy hitters and their approximate frequenc...
متن کاملComparison between multistage filters and sketches for finding heavy hitters
The purpose of this write-up is to compare multistage filters [3] and sketches with respect to their ability to identify heavy hitters. In a nutshell, the conclusion is that multistage filters as I use them identify heavy hitters with less memory than sketches, but some sketches support important other operations, more specifically they can be added and subtracted without any need to re-read th...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1709.02919 شماره
صفحات -
تاریخ انتشار 2017